lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Reliability & Performance.html (5389B)


      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-0.7385642528533936"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-06 13:42:39 +0000"/><meta name="latitude" content="52.33451453990055"/><meta name="longitude" content="4.866780680819775"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-19 00:04:32 +0000"/><title>Reliability &amp; Performance</title></head><body><div style="margin-left: 40px;"/><div style=""><span style="font-weight: bold;">How to ensure reliability</span><br/></div><div>what are the threats?
      4 </div><ul><li><div>disk failures: bad blocks, whole-disk errors</div></li><li><div>power failures: (meta)data inconsistently written to disk</div></li><li><div>software bugs: bad (meta)data written to disk</div></li><li><div>user errors: rm *.o vs rm * .o; dd if=/dev/zero of=zeros bs=1M # fill disk quota</div></li></ul><div>backups: incremental vs full, online vs offline, physical vs logical (on filesystem level), compressed vs uncompressed, local vs remote</div><div><br/></div><div>RAID: redundant array of independent (originally inexpensive) disks
      5 </div><ul><li><div>virtualise addressing on top of multiple disks (as single address space)</div></li><li><div>RAID control operates just like MMU in memory</div></li><li><div>options:
      6 </div></li><ul><li><div>mirroring (RAID 1) -- no real slowdown or advantage for writing. but reading can be done in parallel from two different disks.</div></li><li><div>striping (RAID 0) -- scatter accross disks. no reliability benefits, but very good performance.</div></li><li><div>hybrid -- first few you stripe. the last disk, you store parity bits.</div></li></ul><li><div><img src="Reliability%20&amp;%20Performance.resources/5D80EC7E-4975-43D9-9A86-4B7659D0DB1F.png" height="536" width="1038"/></div></li><li><div><a href="https://en.wikipedia.org/wiki/Nested_RAID_levels">Wikipedia page</a></div></li></ul><div>fsck (File System Consistency Check)
      7 </div><ul><li><div>you need invariants. so exploit redundancy in existing filesystems.</div></li><li><div><img src="Reliability%20&amp;%20Performance.resources/6D574623-C739-4747-BA19-0F9A010FB0A0.png" height="566" width="1014"/></div></li></ul><div><br/></div><div><span style="font-weight: bold;">Improve filesystem performance:
      8 </span></div><div>minimize disk access:
      9 </div><ul><li><div>caching: buffer cache, inode cache (literally cache of inodes stored in memory), direntry cache (for e.g. path name lookups)
     10 </div></li><ul><li><div>buffer cache:
     11 </div></li><ul><li><div>build list recently used queue. end is most recently used, front is least recently used.</div></li><li><div>periodically evict from front. hash table pointing to indicies (don't have to go through whole list to search)</div></li><li><div>write-through caching (if doing write on block, will do on cache, and immediately persist on disk) vs. periodic syncing (periodically write back blocks in buffer cache, typically with daemon)</div></li><li><div><img src="Reliability%20&amp;%20Performance.resources/DD35C5DC-89B7-4AE4-9868-1514F6EC5DD3.png" height="244" width="673"/></div></li></ul></ul><li><div>block read ahead (anticipate access patterns</div></li></ul><div>minimize seek time (stay in the same section of memory more or less):
     12 </div><ul><li><div>try to alloc files contiguously</div></li><li><div>spread i-nodes over disk</div></li><li><div>store small file data 'inline' in i-node (as metadata kind of)</div></li><li><div>defragment disk</div></li></ul><div><br/></div><div><span style="font-weight: bold;">Different file system options:
     13 </span></div><div>log-structured filesystems:
     14 </div><ul><li><div>optimise for frequent small writes</div></li><li><div>collect pending writes in log segment, flush to disk sequentially</div></li><li><div>segment can contain anything (inodes, dir entries, blocks, whatever) and can be e.g. 1 MB in size</div></li><li><div>relies on inode index to find inodes in log efficiently</div></li><li><div>garbage collection to reclaim stale log entries</div></li></ul><div>journaling filesystems:
     15 </div><ul><li><div>use 'logs' for crash recovery</div></li><li><div>first write transactional operations in log:
     16 </div></li><ul><li><div>remove file from its dir</div></li><li><div>release inode to pool of free inodes</div></li><li><div>return all disk blocks to pool of free disk blocks</div></li></ul><li><div>after crash, replay operations from log</div></li><li><div>requires single operations to be <span style="font-style: italic;">idempotent</span></div></li><li><div>should support multiple, arbitrary crashes</div></li><li><div>journaling is widely used in modern filesystems</div></li></ul><div>virtual filesystems (VFS):
     17 </div><ul><li><div><img src="Reliability%20&amp;%20Performance.resources/608B7B6B-85B9-47E0-83EF-D0D90872D0B9.png" height="321" width="591"/></div></li><li><div><img src="Reliability%20&amp;%20Performance.resources/92AACEFD-6BC6-474F-B02D-D4B991CF50B6.png" height="382" width="510"/></div></li></ul><div><br/></div></body></html>